Model Selection

Web Adaptation

# Web Adaptation

Qwen3 1.7B ONNX

Qwen3-1.7B is a 1.7B-parameter open-source large language model released by Alibaba Cloud, based on the Transformer architecture, supporting various natural language processing tasks.

Large Language Model

Whisper Large V3 Turbo

An ONNX-optimized Whisper large speech recognition model designed for web deployment

Speech Recognition

Timesformer Base Finetuned K600

TimeSformer is a video understanding model based on the Transformer architecture, specifically designed for video classification tasks.

Video Processing

Depth Anything V2 Base

Depth-Anything-V2-Base is an ONNX-format depth estimation model adapted for Transformers.js, designed for image depth estimation on the web.

Whisper Base.en

Whisper is a general-purpose speech recognition model trained by OpenAI. This model is based on large-scale weakly supervised training and supports speech transcription in multiple languages.

Speech Recognition

MusicGen Small is a Transformer-based music generation model capable of producing high-quality music clips from text descriptions.

Audio Generation

Object detection model based on YOLOv9, adapted for Transformers.js, capable of running in a browser

Object Detection

Depth Anything Large Hf

ONNX version of depth estimation model based on Transformers.js, suitable for web applications

Hubert Base Superb Ks

A voice command recognition model based on the HuBERT architecture, optimized for keyword spotting tasks

Audio Classification

Dpt Hybrid Midas

Hybrid depth estimation model developed by Intel, combining the advantages of convolutional neural networks and Transformer architecture

Nougat is a vision-based academic document understanding model capable of converting scientific PDF images into Markdown-formatted text.

Trocr Base Printed

TrOCR is a Transformer-based OCR model specifically designed for recognizing printed text.

Text Recognition

Trocr Small Printed

TrOCR-small-printed is a compact optical character recognition (OCR) model specifically designed for printed text recognition.

Text Recognition

Distilbart Cnn 12 6

DistilBART-CNN-12-6 is a distilled version of the BART model, optimized for text summarization tasks, with a smaller size while maintaining high performance.

Text Generation

YOLOS is an object detection model based on the Transformer architecture, designed for efficient visual task processing.

Object Detection

YOLOS-small is a small object detection model based on the Transformer architecture, designed for efficient visual tasks.

Object Detection

E5-small-v2 is an efficient text embedding model suitable for various natural language processing tasks.

MMS-LID-4017 is a speech recognition model supporting 4017 languages, developed by Facebook, focusing on language identification tasks.

Text Classification

MMS-LID-126 is a multilingual speech recognition model released by Facebook, supporting recognition of 126 languages.

Text Classification

Ast Finetuned Speech Commands V2

A voice command recognition model based on AST architecture, optimized for web deployment in ONNX format

Audio Classification

Ast Finetuned Audioset 10 10 0.4593

Audio Spectrogram Transformer (AST) model fine-tuned on the AudioSet dataset for audio classification tasks

Audio Classification

Whisper Medium is a medium-scale speech recognition model developed by OpenAI, supporting automatic speech recognition (ASR) tasks in multiple languages.

Speech Recognition

Detr Resnet 101

End-to-end object detection model based on Transformer architecture with ResNet-101 feature extractor

Object Detection

A large-scale text summarization model based on the BART architecture, optimized for the CNN/DailyMail dataset

Text Generation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase